Keyword Reduction for Text Categorization using Neighborhood Rough Sets
نویسنده
چکیده
Keyword reduction is a technique that removes some less important keywords from the original dataset. Its aim is to decrease the training time of a learning machine and improve the performance of text categorization. Some researchers applied rough sets, which is a popular computational intelligent tool, to reduce keywords. However, classical rough sets model, which is usually adopted, can just deal with nominal value. In this work, we try to apply neighborhood rough sets to solve the keyword reduction problem. A heuristic algorithm is proposed meanwhile compared with some classical methods, such as Information Gain, Mutual Information, CHI square statistics, etc. The experimental results show that the proposed methods can outperform other methods.
منابع مشابه
A New Approach for Knowledge Based Systems Reduction using Rough Sets Theory (RESEARCH NOTE)
Problem of knowledge analysis for decision support system is the most difficult task of information systems. This paper presents a new approach based on notions of mathematical theory of Rough Sets to solve this problem. Using these concepts a systematic approach has been developed to reduce the size of decision database and extract reduced rules set from vague and uncertain data. The method ha...
متن کاملMultiple Sets of Rules for Text Categorization
This paper concerns how multiple sets of rules can be generated using a rough sets-based inductive learning method and how they can be combined for text categorization by using Dempster’s rule of combination. We first propose a boosting-like technique for generating multiple sets of rules based on rough set theory, and then model outcomes inferred from rules as pieces of evidence. The various e...
متن کاملReduction of Neighborhood-Based Generalized Rough Sets
Rough set theory is a powerful tool for dealing with uncertainty, granularity, and incompleteness of knowledge in information systems. This paper discusses five types of existing neighborhoodbased generalized rough sets. The concepts of minimal neighborhood description and maximal neighborhood description of an element are defined, and by means of the two concepts, the properties and structures...
متن کاملHeterogeneous Attribute Reduction in Noisy System based on a Generalized Neighborhood Rough Sets Model
Neighborhood Rough Sets (NRS) has been proven to be an efficient tool for heterogeneous attribute reduction. However, most of researches are focused on dealing with complete and noiseless data. Factually, most of the information systems are noisy, namely, filled with incomplete data and inconsistent data. In this paper, we introduce a generalized neighborhood rough sets model, called VPTNRS, to...
متن کاملReduction of Rough Set Based on Generalized Neighborhood System Operator
The theory of generalized neighborhood system-based approximation operators plays an important role in the theory of generalized rough sets since it includes both the neighborhood-based approximation operators and the covering-based approximation operators as its special circumstances. The theory of reduction is one of the most significant directions in rough sets. In this work, the reduction o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015